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This report is the second part^of^a study designed to 

construct a test for measuring musical aptitude of^-person^ from 
various age groups. It covers the construction of the test, material, 
item analysis, reliability, validity, and possible future steps. The 
test is composed of musical recording'ST'^termined from pilot 
studies, that the test groups analyzed for acoustical structure. 
Three versions of the test were developed to raise its reliability, 
patterns of relationships instead of absolute figures are measured to 
show the test's validity, which is expressed in several tables in the 
report. Findings indicate* that there are no essential differences 
between the total correlatioiws and the deviations of th^ items in the 
different versions of the test. The total correlations show a 
relatively low but consistent positive relation. A major conclusion 
,is that a subject's age affects his test results very little. This 
seems to support the theory that musical aptitude develops at an 
early age. See ED 092 440 for a -report on the first part of this 
study covering the background theory and pilojt studies. (ND) 
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Preface 

O ^ 

As this is a direct sequel to the first part 
of this study - Background Theory and Pilot 
Studies ^ the same persons are to be thanked 

for help and advice* In addition to them, I 

t ' - 

wish to thank the principal of the .Music 
Insti^te.of Vantaa, Olli Ruottinen, for" 
co-operation. 

It seems to be difficult to find the right place 
for thil kind of a study in any of t'he faculties 
of the university. This being the case, I am 
especially grateful to the Institute of 
Education of the University of Helsinki for 
the possibility'^of publishing studies. 
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1t Test construction 



When the pilot studies were completed the first test, < 
here called version Ai was composed and recorded according 
to the experiences gained that far. The items were directly 
played with an electric organ keeping the tempo subjectively 
relatively fast, approximately the same that was found to 
be suitable in the yH < » ^ studies. In addition to what has 
already been mentioned about the tempo - that it should 
be faster than the test maker fee.ls to avoid making the 
test boring there is another reason for this. It can be 
sijpposed that too slow a tempo gives the subjects a possi- 
bility to reason what the righ^answer is, i.e., they have 
time to think about several p<!rssible alternatives, eliminate 
the impossible ones etc., without really comprehenfling the 
holistic structure^ of the sounds. Because "the test is made 
to maasure an intuitive organizing ability and "not reasoning 
ability this could be a danger to its validity. 

As in the pilot studies pitch, lepg^ and intensity were 
used as the bases for structure forming, i.e., they vary 
in the items each at a time. Timbre differs from item to 
item to make the test more interesting. Vibrato is used 
in some, reverberating in son^ others^ and- so on. 

tI) avoid measurifig di^scriminat ing abilities no very small 
di\^ferences between the sounds were used. The smallest 
intervals' used are samitqnes, usually the intervals are 
bigger than that. In the length' items the longer sounds are 
two times or more longer than the shorter ones. Correspond- 
ingly, th'e louder sounds are approximately three or four 
times louder than the weak ones ( exact ' measures were not 
available). 
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The basic idea in the items has already been described in 
the first part of th^s study CKarma 1973, 16). The subjects 
are to divide the first part of the item into three sittiilar 
parts in their minds and then decide if the second, "answer" 
part^of the . item, is similar to them. Version A consists oN' 
40 itemsij in 15 of them pitch is the varying factor*, in 
13 and 12 items length and inten|ity are the bases for 
organizing* ^ 

Version B of the test was cons*tructed by making an item 

analysis to, version A and choosing the 31 best items to 

foriry the new one. Some technically imperfect items were 

A 

also re-recorded. 

Version C is similar to the B version except that it is 
not directly played but constructed by cutting and gluing 
a tape on which long "basic" toqes had' been played with ^ 
an electric organ. This was done to avoid the possibility 
that the small in^perf ectnesses in the playing would lower 
the reliability of the .test. The cutting method ife relative 
1^ tedious but produces very exact results. 

I 

The exact lengths of the tones in the C version are as 

follows: - 

s 

- In the items where pitch is the varying . factor the 
tones follow each other immediately without a pause. 
In the first seven items (of which three are used as 
examples and for practising) the length of each sound 
' is 0.74 seconds (14 centimetre's of tape in 19 cm/sec. 

speed), in the rest of the test the correspond.i ng 
■ figures are 0.68 sec. (13 cm) . ^This small Shortening 
of the items was done to balance the subjectiN^te feel- 
ing that the tones become longer after some tijne of 
getting used to the nature of the test. 

r 
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- Tn the itefns where pitch is held constant there must, 
naturally, be a pause between each sound. The time 
for this pause is taken from each sound, i.e., the 

'sum of a sound and a pause is the same as the figures 
above. The length of a pause is 0.16 seconds (3 cm 
of tape). 

- The pause between the first and the second' part of 
an item is 3 seconds (57 cm) through the test. 

T-he versions B and C are written in notes in the appendix., 
The' instruction is as follows: 

The idea of dividing the first part of an item into three 
similar parts is presented with a couple of drawn 8>^amples, 
such as the following: j ' , ■ 



r 



t t 



AOO'AOOAOO AOO 



The subjects are shown that'there is only one possible | 
way of .dividir^g the figure into three similar parts without 
leaving any figures over. When ths lines showing the cut- 
pff points have been drawn the 'answer" part is compared 
with the first series of figures. When the drawn exarriples 
have become clear to all subj.ects they are told that the 
problems on the tape are of the same kind but the series 
consist of sounds^-instead of visual figures. The subjects 
are then made familiar v^ith the test by letting them solve 
together the three example problems on the tape. 
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2 , Material 

The material is somewhat fragmentary owing to practical 
difficulties. The material is mainly obtained in connection' 
with selecting pupils to music^ institutes . The institutes 
gave their own tests tc the applicants at the same time 
and thus there was not very much time for tests not relevant 
to the selection. This is why for ex-ample the intelligence 
tests have been given to part of the subjects only. The 
different versions of the test have been used in the 
following connections: * 

• 

-Version A has been given to' tTie applicants for the Music 
Institute of Espoo in spring 1973 (N=308). Some information 
about intelligence, previous schooling and the "1:ests of 
the institute is also available. 



- Version B has ibeen- given to groups of pupils in the 
Music Instit^ e of Kirkkonummi and the Pop & Oazz Jnstitute 
of Oulunkyla''^ in spring 1974. Sample, sizes are 130 and 94, 
correBponcjingly^ Teachers' ratings about musical aptitude 
and achievement ar^also available for part of this material 

- Version C has been given to a) the applicants for the 
music institutes .of Espoo (N = 245) and Kirkkonummi (N = 44), 
b) the school class of the Institute of Education of the 
University of Helsinki (third grade, N=203, c) pupils of 
the Music Institute of Vantaa and some elementary school 

.classes in Vantaa (N=133). Information about achievement, 
intelligence, training and the tests of the institutes is 



''•^The material from the Pop S Oaz^z' Instittite has been 
collected and processed by Irmeli Himbexg.,_l<auko Salmi 
and Sampo Suihko. . . 
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also available for part of the *naterial. C-version has 
be,en used during the spring 1974. When the figures above 
are summed up an overall total of 974 is attained. As to 
the age, the range of the subjects is from six-year-olds 
to adults^ 



3. I*tBm analysi s 

There were no essential differences between the item-totpl 
correlations and the deviations of- the items in the differ- 
ent versions af the test. 'Closer information is given of 
the version -C because it is written in notes (appendix) 
and because of its bigger available numerus when compared 
tovversion B.| 

Ifhe item-total correlations • show a relatively low but 
cto'fKistent positive relation. In this kind of a situation 
removing of items does n^t improve the reliability substan- 
tially. This being 'the case in all the versions of the test 
all items are ih>luded when reliability and validity are 
'discussed if not especially ment ion|d . .5. 

i 

'rheVelative-ly low average correlations are probably most 
due; to two fealons: first, many items have been very easy 

■andshava thus extreme p-values. These can not correlate 
very highly with any external variable. Second, the way 
of answering, the t^ue-false format of the test, makes 
. random -guessing relkively probable which in turn lowers , 
the correlations. 

I 

It is a common phenomenon that it is difficult to construct 
good items the answers to which are actually right, i.e., 
it is much easier to make a wrong alternative look right ^ 
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Tabae 1. Item-total correlations .(r^^ ) > p-values' (p) and 
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than vice versa. Thi^ is also true here. The average p-value 
s£»the items to which the right answer is "yes" is .87 and 
the average p-vaiue of the "no"-items is .78. This means 
that "no"-itams discriminate better. They also seem to 
be better in terms of item-total correlations -if the 10 
best items are chosen nine of them are '•no"-items. 



4'.:- Reliability • 

Table 2. ReUability coefficients 



ret^st-reliability 



o 

version A 


,66 (N=**286, does not 


■ - 


include 6 .and 7 years 
old subjects) 

\ 


,57 (correlation 
between versions 
A and B, N==37) 


• 

version B 

• 


.51 .61 (before and 
after removing. 10 
items, N=94) ' * 
.58 (.66)''"^ tN = 130)'' 


.68 (N=27) 


version C 


.55 (.61)''^ (N = 3093 « 


• 

t 



1) 



The coefficients in parentheses are the reliabilities 
Spearman-Brown-correq^d to the length of 40 items. Thi% 
has Been done to ease the comparison between th^ different 
versions of the test|. ' .. 
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The basic reason for making the differe'nt, versions of the 
test was an attempt to raise the reliability by making the 

0ap^e .technically better/lt was thought that the smal,i 
differeKces, e.g. in the lengths of sounds intended td, be. 
of the. same length could be an important source of unreliab- 
ility. However, the reliabilities oould not be raised in 

:_.this way which shows that -the small unexactnesses in the \. 
tape are not important in Reneral , although they maV have 
an , effect in soma individual cases. / 

The obvious' reason for the relatively low reliability of 
the test is, then, the true-fal^e format of the items which 
makes it possible to guess right in 50 % of the cases. 
Oosterhof and Glassi\app (1974) have compared the reliabili- 
ties of the true-false and four-alternative multiple-choice • 
formats empirically. According to their results the approxi- 
mate reliability of .60 obtained here would be in the -region 
of .85 - .90 if there were four alternatives to choose from 
in eve^y^i^em. Although using the multiple-choice format 
'in, this test has proved difficult when compared with tests 
in which the problems are presented on paper its advanta- 
ges are so evident that it seems to'^be worth trying. The 
main concern is probably h'ow this could be doni without 
affecting the validity. One possible practical solution 
is presented on page 17. 

Reliability coef f ici-^nts provide some information about 
the Internal construction of the test, too. Because co'ef- 
ficient alpha is a measure of the internal consistency, of 
the test, ar^cj retest reliability gives information about 
the stability of the test over time, it would be reasonable 
to expect higher numerical values for the retest coefficients 
if there were subscales in the test, i.e., if it were not 
internally as consistent as the results are reproducable . 
Because the different coefficients are very close to each 
other it can be concluded that there are no clea-r subscales 
in. the test . ' ^ 
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5. Validity 

In the typical" case when a tast is made there is no direct 
^and reliable measure of the property aimed atj if there • - 
Were one, making the test would usually be unnecessary. 
This mgkes it much more difficult to determine the validity 
of a test than its reliability. There are different wtfys 
of solving this problems the strategy that was considered 
best' in .tl?is case was to find a pattern of" relations instead 
of a siVg-le maximized measure. This pattern of relations 
can be cqmpared to the relations which arfe hypothesized to 
be present if the test measures the right theoretical concept 

i(and if such a concept has , -correspondence in re_ality). In 

other words it can be said that construct validity is the 
main^concern In this chapter. (For construct validity see, 
e.g., Cronbach 1966 , t120 ; ' Nunnally igB?^^ 63). Focusing on 
the pattern of relations instead 6f the 'absolute figures 
is also reasonable here because the relativ'ely low reliab- 
ility of the test tends to cause "shrinkage" in the correla- 
tions, i.e., the level 'may^ be lower than it would b'e within 
a more reliable test al'th-ough there is no reason for the 
relations of the correlations to be changed. ■ 

The following tables present dorrela^ons between the 
different varsiohs of the test and some other measures. 
Because all the measures are not available for all subjects 
there are empty entries in the matrices. The variables in 
the matrices are as follows: 

Playing . The applicants for the music institutes were to 
play sofnething if they had any previous experience with 
any instrument. The performance was ^ated by experienced 
instrument teachers. The performance was prepared in 'advance. 
Singing . The^applicants were also to prepare a little song 
or tune the/ either sang of hummed. This was rated by the 
same judges as above. I 
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Tests developsj at the institutes. The most common instru- 
ment in sBlecting pupils to music institutes in Finland 
is a test where the subjects ^are to hum or whistle a given 
melody and some tones reproducing- the right pitch, tap given 
rbythms and the like. Although there is no standardized 
test of this kind the variation between the institutes is 
li'ttle • 

Former .training in music. Information about former training '-^ 
in music was given by the applicants for the institutes. 
Although this was given different weights by rating the 
effectivenesses of the various kinds of training (music 
classes''^ in schools, group instruction, individual teaching, 
etc.) this variable is probably relatively unreliab-le* The 
effects of inusical/unmusical homes and the like could not 
be controlled. Thus this measure must be taken as a hint 
only. 

School mark in music is in most cases an ordinary class 
teacher's rating of the student's achievement in the subject 
"music". 

Sentence completion is a subtest from Heinonen's battery 

of factor tests of intelligence (Heinonen, 1 963 ) . This test 

was used as an operation ali zat ion for general intelligence. 

There was lack of time, and this short test was considered 

to give information about general intelligence although 

more exactly it is of course a test of verbal reasoning / 

(Heinonen's own factor analyses support this decision). | 

Mirror-test. This test is also taken from Heinonen's battery.' 

— -—"^ « — * , 

It was used to validate the hypothesis made about the rela-- ; 

tion between musical and spatial abilii:y (Karma 1973). The ' 

items, consist of figures which are eittter similar with a .j 

model .figure or mirror-images of it:^7he subject's task is / 

to separate the mi rror- images from the other figures, « ' 

. 1 — — . 

''''The term "music class" is used here to refer to classes 

/ 

having an additional amount of teaching in music compared 

to ordinary classes. The pupils are selected for these classes 

according to interest in mi/fic and musical aptitude, 

00016 / 
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Music institute teacher^s rating of aptitude. Instrument 
teachers *who teach their pupils individually were asked to 
rate the aptitude of their 'pupils try:^ng to keep their 
Judgments free from the effects of the pupils" motivation^ 
and the amount of training. 1 

Music institute and music class teactVers^ ratings of achieve 
ment . These are mostly ratings of progress in instrume,nt 
playing, sight singing and the like. 



Table 3.: Correlatiqn matrices. A- B- and -C-versi^ns of the 
structuring \test and some other measures. For closer infor- 
mation about, the variables see text. 

Table 3.1. Version.A 



4. 



1. Version A 


1 . 




2. • . • 


3. 


2. Playing 


.20 






3. Singing 


.06 


(99) 


.52 (99) 




4. Tests of the 
institutes 


.26 


(99) 


.41 (99) 


.58 


5. Former training 

6. Sentence completion 


.23 

.09 


(99) 
(1063 


.41 (99) 
7712) 


.21 


7. Mirror-test 


.33 


(48) 







.16 (99) 



Table 3.2. Version B 



1 . Version B 1 • 2.3. 

1 ) 

2. Teacher^s rating of aptitude .76 (25) 



2) 



3. Teacher^'s r'ating of 

achievement * 1 5 (91) 

4. Former training jJl (116) ^ (116) 



''^Because all the measures are not available for all subjects 
the corresponding numerals is given after every figure. 
2 1 

Missing information 
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rablB 3.4. Correlations of the A- B- and C-vsrsions with 
the external criterions. A summary of tables 3.1., 3.2., 
and 3.3. 



Critl 



anon 



\ 

Playing 

1 

Slanging 

Tests of the music 
institutes 

Music institute and 
music class teachers' 
rating of achievement 

Music institute teacher's 
rating of aptitude 

Former training 

School mark in music 

Sentence completion 

Mirror- test 



Version of; the test 
A B 



.20 (99)''^ 
.06. (99) 

.26 (99) 



2) 



.23 (99) 

.09 (106) 
.33 (48) 



.15 t^l) 



.76 (25) 
.13 (118) 



i 



.24 (231) 
.12 (238)' 

.33 (322) 
.'53 (54) 

.01 (240) 

.05 (222) 

■ 09 (104) 

.33 (89) 



1 ) 

Numerus in parentheses 



71 . 

Missing information\ 

\ 
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The follow^j'ng comments may be of help when the tables 
are interpreted : 

When singing is rated the fnain sources of variance are 
the quality of voice and exactness of pitch. This is in 
li;ne with the practical experience of the author, too. 
Th'us, no. strong relation to organizing ability is to be 
expected. 

The tests of the "institutes are probably clearly loaded 
on organizing (or structuring) ability but have also a 
strong connection with producing capabilities. This can 
be supposed to lower the correlation with the structuring 
test. This view is supported by the fact that the tests 
of the institutes correlate highest (.51 - .58) with 
singing. 

^ The correlation with ratings of aptitude is probably an 
overestimate caused by the small numerus of this variable. 
It is hard to make teachers estimate their pupils' aptitudes 
when they are used to judge achievement . As a matter of 
fact, a great deal of the estimates meant to be ratings 
of aptitude proved to be ratings of achievement when this 
was controlled afterwards. So only 25 ratings of aptitude 
are left in the tables. 

' ■ ■ y 

- The school mark in music has/quite little to do with 

aptitude when or^dinary classes are concerned. The strong- 
est factors forming the school marks ars prgbaly singing 
and interest in music (the correlation with singing is .48). 



It should be remembered, however', that a relatively 
strong relation (.60) was found in the pilot studies, too 
(Karma 1973). 
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Because of the uncompleteness of the data no formal 
factor analysis was performed. An "armchair factor analysis"'' 
seems, however, to suggest' the three following factors' 
(The analysis is mainly ba^ed on version C (table 3.3.) 
which is the most complete); 

I. Producing , mainly singing. The correlations on which 
this is' based are the following; 

1 . Singing 1 • 2 . 3 • 

2. Tests of the institutes .51 

3. School mark in music .48 .34 

4. Playing •3'' -27 .28 



II. Structuring ability. This would be based on the followin 
correlations: 



1 



The structuring test 1- 2. 3. 

2. Tests of the institutes .31 

3. Music class teacher's 
rating of achievement .53 

4. Mirror-test '33 .17 



III. Former training . f This would be indicated by the 
correlation between former training and playing (.32 in 
version C, and .41 in A-version, table 3.1.). 

As a conclusion from the validity data it may be said ■ ^ 
that it supports the theoretical background presented 
in the first part of this study. A great deal of the 
variance in other measures of musical aptitude can bp 
explained usi.ng tljp concept "ability to structure acoustic 
material" although it is too early to say that musical 



''Whe term is adopted from Kerlinger (1973, 691) and refers 
to subjective viewing of the correlation matri>( for 
estimating its .possible factorial structure. 
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aptitude U the structuring ^bility . The supposed relation 
to spatial ability seems also to be present. The difference 
between the correlations with general intelligence^ (verbal 
reasoning) and spatial ability is exactly the same in the 
1973 and 1974 material. The difference is statistically 
•significant at 10 % level (A-version), at 5 % level (C- 
version) and at 2 % level in combined material. It also 
seems that the structuring ability depens very little 
on former training. 



6. Discussion and next steps 

The~lower limit of the age of the subjects seems to be • 
determined by their ability to read and write. Although.j 
some six years old subjects have ^uccesfully taken the ' 
test it seems appropriate not to give the test to subjects 
urider eight years of ags. It has not been tried, however, 
to use the test individually'. This could make it possible 
to test younger subjects; written answers could in this 
case be replaced by oral ones. 

The subject's age seems to affect his results very little. 
tDnly about two points out of thirty-one was enough to 
balance the effect of age between eight years old and 
adult subjects. When compared to th^e effect of age on, 
say, intelligence tests, this is surprisingly little. 
This seems to support the common view thatlmusical aptitude 
develops in an early age. 

Using timbre for making the test more intr:!re8ting and 
keeping it as short as possible seem to have been good 
decisions. Several subjscts have spontaneously told that* 
the test was nice and interesting ." The younger the subjects 
are the more difficult and important it is to keep them 
motivated. Lack of motivation' is thought to be able to 
- have an effect on the validity of the test. 
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When the first pilot studies with the test were made 
it was noted that a usual multiple;choice format did not 
work in this kind of 3' test* If several alternative answers 
were played after the first part of an item, two problems 
■arised: first, the subjects tended to forget the beginning 
of an item and, second, it was easy to guess the right 
answer by looking when' the other subjects marked their 
papers. This is why the true^false format was chosen for 
the test. Because of the unreliability of this kind of 
a test a way of using the multiple-choice format without 
these drawbacks should, however, be cjevei^ped for the 
fgture versions of the lest. The solution that will be 
used in the next version is the following: Instead of 
fixing the number of similar parts in the series of sounds 
and making the subjects figure out what one part is like, 
the amount of p$rts varies and the task of the subjects 
is to determine how many similar parts the series consists 
of. For example, the right answer to the following item 
would be, "three" because the series of sounds can be 
ciivided into three similar subseriesj 



J rrJ rr J r r = 

The instruction has been a source of confusion in some 
cases. It is difficult to some subjects to understand 
the relation between the visual examples and the auditive 
items of the test. Thus the visual examples will be 
abandoned and several easy tape-recorded examples will 
be used. to make the subjects familiar with the nature of 
the problems. 

In addition to developing the test itself the relation 
of structuring ability to other abilities and personality 
traits, will be investigated by giving the test to sObjects 
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about whom this information is available. If there will 
be practical possibilities, the relation to some standard- 
ized tests of musical aptitude will also be examined. 
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